Lecture 11
Positive autocorrelations out to a large number of lags usually indicates a need for differencing
Slightly too much or slightly too little differencing can be corrected by adding AR or MA terms respectively.
A model with no differencing usually includes a constant term, a model with two or more orders (rare) differencing usually does not include a constant term.
After differencing, if the PACF has a sharp cutoff then consider adding AR terms to the model.
After differencing, if the ACF has a sharp cutoff then consider adding an MA term to the model.
It is possible for an AR term and an MA term to cancel each other’s effects, so try models with fewer AR terms and fewer MA terms.
The fable package provides a number of different information criteria for model selection,
\[ \begin{aligned} AIC &= -2 \log \mathcal L + 2(p+q+k+1) \\ \\ AICc &= AIC + \frac{2(p+q+k+1)(p+q+k+2)}{n-p-q-k-2} \\ \\ BIC &= AIC + (\log (n) - 2)(p+q+k+1) \\ \end{aligned} \]
For small values of \(n\), AIC can overfit the data (select too many predictors) and so AICc is preferred. BIC is a more conservative version of AIC that penalizes the number of predictors more heavily.
Model choices:
model(
elec_sales,
ARIMA(value ~ pdq(3,1,0)),
ARIMA(value ~ pdq(2,1,0)),
ARIMA(value ~ pdq(4,1,0)),
ARIMA(value ~ pdq(3,1,1))
) |>
glance()# A tibble: 4 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 ARIMA(value ~ pdq(3, 1, 0)) 9.85 -486. 979. 980. 992. <cpl> <cpl>
2 ARIMA(value ~ pdq(2, 1, 0)) 10.9 -495. 997. 997. 1006. <cpl> <cpl>
3 ARIMA(value ~ pdq(4, 1, 0)) 9.78 -484. 979. 979. 995. <cpl> <cpl>
4 ARIMA(value ~ pdq(3, 1, 1)) 9.74 -484. 978. 978. 994. <cpl> <cpl>
Australian total wine sales by wine makers in bottles <= 1 litre. Jan 1980 – Aug 1994.
We can extend the existing ARIMA model to handle these higher order lags (without including all of the intervening lags).
Seasonal \(\text{ARIMA}\,(p,d,q) \times (P,D,Q)_s\): \[ \Phi_P(L^s) \, \phi_p(L) \, \Delta_s^D \, \Delta^d \, y_t = \delta + \Theta_Q(L^s) \, \theta_q(L) \, w_t\]
where
\[ \begin{aligned} \phi_p(L) &= 1-\phi_1 L - \phi_2 L^2 - \ldots - \phi_p L^p\\ \theta_q(L) &= 1+\theta_1 L + \theta_2 L^2 + \ldots + \theta_p L^q \\ \Delta^d &= (1-L)^d\\ \\ \Phi_P(L^s) &= 1-\Phi_1 L^s - \Phi_2 L^{2s} - \ldots - \Phi_P L^{Ps} \\ \Theta_Q(L^s) &= 1+\Theta_1 L + \Theta_2 L^{2s} + \ldots + \theta_p L^{Qs} \\ \Delta_s^D &= (1-L^s)^D\\ \end{aligned} \]
Lets consider an \(\text{ARIMA}(0,0,0) \times (0,1,0)_{12}\): \[ \begin{aligned} (1 - L^{12}) \, y_t = \delta + w_t \\ y_t = y_{t-12} + \delta + w_t \end{aligned} \]
\(\text{ARIMA}(0,0,0) \times (0,1,1)_{12}\):
\[ \begin{aligned} (1-L^{12}) y_t = \delta + (1+\Theta_1 L^{12}) w_t \\ y_t = \delta + y_{t-12} + w_t + \Theta_1 w_{t-12} \end{aligned} \]
m = model(
wineind,
m1 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,0, period=12)),
m2 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,1, period=12)),
)
glance(m)# A tibble: 2 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 m1 7176802. -1527. 3057. 3057. 3064. <cpl [0]> <cpl [0]>
2 m2 6369103. -1517. 3041. 3041. 3050. <cpl [0]> <cpl [12]>
\(\text{ARIMA}(0,1,0) \times (0,1,1)_{12}\)
\[ \begin{aligned} (1-L) \, (1-L^{12}) y_t = \delta + (1 + \Theta_1 L)w_t \\ y_t = \delta + y_{t-1} + y_{t-12} - y_{t-13} + w_t + w_{t-12} \end{aligned} \]
m = model(
wineind,
m1 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,0, period=12)),
m2 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,1, period=12)),
m3 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,1, period=12))
)
glance(m)# A tibble: 3 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 m1 7176802. -1527. 3057. 3057. 3064. <cpl [0]> <cpl [0]>
2 m2 6369103. -1517. 3041. 3041. 3050. <cpl [0]> <cpl [12]>
3 m3 10751355. -1553. 3109. 3109. 3115. <cpl [0]> <cpl [12]>
\(\text{ARIMA}(1,1,0) \times (0,1,1)_{12}\)
\[ (1-\phi_1 L) \, (1-L)\, (1-L^{12}) y_t = \delta + (1 + \Theta_1 L)w_t \\ y_t = \delta + (1+\phi_1) y_{t-1} - \phi_1 y_{t-2}+ y_{t-12} - (1+\phi_1) y_{t-13} + \phi_1 y_{t-14} + w_t + w_{t-12} \]
m = model(
wineind,
m1 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,0, period=12)),
m2 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,1, period=12)),
m3 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,1, period=12)),
m4 = ARIMA(value ~ pdq(1,1,0) + PDQ(0,1,1, period=12))
)
glance(m)# A tibble: 4 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 m1 7176802. -1527. 3057. 3057. 3064. <cpl [0]> <cpl [0]>
2 m2 6369103. -1517. 3041. 3041. 3050. <cpl [0]> <cpl [12]>
3 m3 10751355. -1553. 3109. 3109. 3115. <cpl [0]> <cpl [12]>
4 m4 8267920. -1532. 3070. 3070. 3079. <cpl [1]> <cpl [12]>
\(\text{ARIMA}(1,1,2) \times (0,1,1)_{12}\)
\[ (1-\phi_1 L) \, (1-L)\, (1-L^{12}) y_t = \delta + (1+\theta_1 L+\theta_2 L^2)(1 + \Theta_1 L)w_t \\ y_t = \delta + (1+\phi_1) y_{t-1} - \phi_1 y_{t-2}+ y_{t-12} - (1+\phi_1) y_{t-13} + \phi_1 y_{t-14} \\ + w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + w_{t-12} + \theta_1 w_{t-13} + \theta_2 w_{t-14} \]
m = model(wineind,
m1 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,0, period=12)),
m2 = ARIMA(value ~ pdq(0,0,0) + PDQ(0,1,1, period=12)),
m3 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,1, period=12)),
m4 = ARIMA(value ~ pdq(1,1,0) + PDQ(0,1,1, period=12)),
m5 = ARIMA(value ~ pdq(1,1,2) + PDQ(0,1,1, period=12))
); glance(m)# A tibble: 5 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 m1 7176802. -1527. 3057. 3057. 3064. <cpl [0]> <cpl [0]>
2 m2 6369103. -1517. 3041. 3041. 3050. <cpl [0]> <cpl [12]>
3 m3 10751355. -1553. 3109. 3109. 3115. <cpl [0]> <cpl [12]>
4 m4 8267920. -1532. 3070. 3070. 3079. <cpl [1]> <cpl [12]>
5 m5 5399312. -1497. 3004. 3004. 3020. <cpl [1]> <cpl [14]>
prodn from the astsa packageMonthly Federal Reserve Board Production Index (1948-1978)
Based on the ACF it seems like standard differencing may be required
fr_m = model(
prodn,
m1 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,0, period=12)),
m2_1 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,1, period=12)),
m2_2 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,2, period=12)),
m2_3 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,3, period=12)),
m2_4 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,4, period=12))
)
glance(fr_m)# A tibble: 5 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 m1 2.52 -675. 1353. 1353. 1356. <cpl [0]> <cpl [0]>
2 m2_1 1.62 -599. 1203. 1203. 1210. <cpl [0]> <cpl [12]>
3 m2_2 1.62 -599. 1204. 1204. 1216. <cpl [0]> <cpl [24]>
4 m2_3 1.51 -588. 1183. 1183. 1199. <cpl [0]> <cpl [36]>
5 m2_4 1.51 -587. 1185. 1185. 1204. <cpl [0]> <cpl [48]>
fr_m = model(
prodn,
m1 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,0, period=12)),
m2_3 = ARIMA(value ~ pdq(0,1,0) + PDQ(0,1,3, period=12)),
m3_1 = ARIMA(value ~ pdq(1,1,0) + PDQ(0,1,3, period=12)),
m3_2 = ARIMA(value ~ pdq(2,1,0) + PDQ(0,1,3, period=12))
)
glance(fr_m)# A tibble: 4 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 m1 2.52 -675. 1353. 1353. 1356. <cpl [0]> <cpl [0]>
2 m2_3 1.51 -588. 1183. 1183. 1199. <cpl [0]> <cpl [36]>
3 m3_1 1.34 -566. 1142. 1142. 1161. <cpl [1]> <cpl [36]>
4 m3_2 1.33 -564. 1140. 1140. 1163. <cpl [2]> <cpl [36]>
Monthly cortecosteroid drug sales in Australia from 1992 to 2008.
Sta 344/644 - Fall 2023